Document Summarization using Wikipedia
نویسندگان
چکیده
© Document summarization using Wikipedia Krishnan Ramanathan, Yogesh Sankarasubramaniam, Nidhi Mathur, Ajay Gupta HP Laboratories HPL-2009-39 Single Document Summarization, Wikipedia, ROUGE Although most of the developing world is likely to first access the Internet through mobile phones, mobile devices are constrained by screen space, bandwidth and limited attention span. Single document summarization techniques have the potential to simplify information consumption on mobile phones by presenting only the most relevant information contained in the document. In this paper we present a language independent single-document summarization method. We map document sentences to semantic concepts in Wikipedia and select sentences for the summary based on the frequency of the mapped-to concepts. Our evaluation on English documents using the ROUGE package indicates our summarization method is competitive with the state of the art in single document summarization. External Posting Date: February 21, 2009 [Fulltext] Approved for External Publication Internal Posting Date: February 21, 2009 [Fulltext] Published and presented at the First International Conference on HCI, Allahabad, India. Jan 20-23, 2009 Copyright the First International Conference on HCI Document summarization using Wikipedia Krishnan Ramanathan, Yogesh Sankarasubramaniam, Nidhi Mathur, Ajay Gupta HP Laboratories 24, Salarpuria arena, Hosur Road, Adugodi, Bangalore, India {krishnan_ramanathan,yogesh,nidhim,ajay.gupta}@hp.com Abstract. Although most of the developing world is likely to first access the Internet through mobile phones, mobile devices are constrained by screen space, bandwidth and limited attention span. Single document summarization techniques have the potential to simplify information consumption on mobile phones by presenting only the most relevant information contained in the document. In this paper we present a language independent single-document summarization method. We map document sentences to semantic concepts in Wikipedia and select sentences for the summary based on the frequency of the mapped-to concepts. Our evaluation on English documents using the ROUGE package indicates our summarization method is competitive with the state of the art in single document summarization. Although most of the developing world is likely to first access the Internet through mobile phones, mobile devices are constrained by screen space, bandwidth and limited attention span. Single document summarization techniques have the potential to simplify information consumption on mobile phones by presenting only the most relevant information contained in the document. In this paper we present a language independent single-document summarization method. We map document sentences to semantic concepts in Wikipedia and select sentences for the summary based on the frequency of the mapped-to concepts. Our evaluation on English documents using the ROUGE package indicates our summarization method is competitive with the state of the art in single document summarization.
منابع مشابه
A new graph based text segmentation using Wikipedia for automatic text summarization
The technology of automatic document summarization is maturing and may provide a solution to the information overload problem. Nowadays, document summarization plays an important role in information retrieval. With a large volume of documents, presenting the user with a summary of each document greatly facilitates the task of finding the desired documents. Document summarization is a process of...
متن کاملTweet Contextualization (Answering Tweet Question) - the Role of Multi-document Summarization
The article presents the experiments carried out as part of the participation in the Tweet Contextualization (TC) track of INEX 2013. In our system there are three major sub-systems; i) Offline multi-document summarization, ii) Focused IR and iii) online multi-document Summarization. The Offline multi-document summarization system is based on document graph, clustering and sentence compression....
متن کاملACL 2013 MultiLing Pilot Overview
The 2013 Association for Computational Linguistics MultiLing Pilot posed a task to measure the performance of multilingual, single-document, summarization systems using a dataset derived from many Wikipedias. The objective of the pilot was to assess automatic summarization of multilingual text documents outside the news domain and the potential of using Wikipedia articles for such research. Thi...
متن کاملEnhancing Single-Document Summarization by Combining RankNet and Third-Party Sources
We present a new approach to automatic summarization based on neural nets, called NetSum. We extract a set of features from each sentence that helps identify its importance in the document. We apply novel features based on news search query logs and Wikipedia entities. Using the RankNet learning algorithm, we train a pair-based sentence ranker to score every sentence in the document and identif...
متن کاملEnhancing Query-oriented Summarization based on Sentence Wikification
Query-oriented summarization is primarily concerned with synthesizing an informative and well-organized summary from a document collection for a given query. In the existing summarization methods, each individual sentence in the document collection is represented as Bag of Words (BOW). In this paper, we propose a novel framework which improves query-oriented summarization via sentence wikificat...
متن کامل